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The Best Laid Plans: 

Principal Incentive Programs in the Teacher Incentive Fund 

Peter Goff, Ellen Goldring, and Melissa Canney 


In an era of heightened accountability and limited fiscal resources, school districts have 
sought novel ways to increase the effectiveness of their principals in an effort to increase student 
proficiency. To address these needs, some districts have turned to pay-for-performance 
programs, aligning leadership goals with financial incentives to motivate and direct leadership 
efforts. 


Pay-for-performance strategies have been applied to schools for decades (Barraclough, 
1973; Educational Research Service, 1979; Kienapfel, 1984) but have expanded in scope and 
scale to now operate through public and private channels at the national, state, and district levels. 
These incentive programs have historically focused on teachers, but in recent years they include 
principals as well. 

The research presented here examines pay-for-performance plans for school principals, 
presenting the plans’ key features and defining programmatic elements within a framework that 
focuses on key decisions that need to be made while designing incentive systems. In so doing, 
this analysis provides a descriptive overview of how pay-for-performance programs for school 
principals are conceptualized and developed while illustrating novel approaches, common 
shortcomings, and creative solutions to challenging dilemmas. 

We base our analyses on the prevailing literature on teacher performance pay and related 
work from the business and public sectors to develop a framework of essential components that 
we apply, via document analysis, to 34 funded proposals from the federal Teacher Incentive 
Fund (TIF). This framework does not dictate the form of performance pay systems; rather it 
identifies the requisite considerations with which system designers must grapple as they 
construct the incentive architecture. We use this approach to address the following research 
questions: 

1. What are the defining characteristics of pay-for-performance programs for school 
leaders, as conceived by practitioners across the country? 

2. To what extent do practitioner-developed pay-for-performance plans for school 
leaders align with a robust design framework? 

We chose to analyze the TIF grants because the program is one of the largest and most 
prominent avenues open to all school districts in the United States to implement incentive pay 
for principals. Through this analysis we learn about the prevailing pay-for-performance plans for 
principals and reach conclusions about their strengths, weaknesses, and likelihood for success. 
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Principal Incentives through the Teacher Incentive Fund 

The federal government initiated TIF in 2006. By the end of 2012, the fund had awarded 
more than $1.5 billion in grant monies aimed at providing high-need schools funds and technical 
assistance to link performance measures to monetary rewards for principals and teachers. TIF 
requires all proposals to include incentive plans for principals All of the 34 funded proposals 
from the first two of three rounds of TIF competition constitute the basis for this analysis. 1 

To support applicants in crafting their proposals, the U.S. Department of Education 
offered technical workshops, conferences, and webinars to review selection criteria and 
requirements. For principals’ performance compensation, proposal guidelines required that 
applicants must (a) give significant weight to student growth in achievement, (b) include 
observation-based assessments of principals’ performance at multiple points in the year, (c) 
demonstrate that incentive payments will be substantial and justify the level of incentive amounts 
chosen, and (d) provide evidence that the proposed plan aligns with a coherent and integrated 
strategy for strengthening the educator workforce. 

TIF applicants were supplied with a scoring rubric to aid in the preparation of their plans. 
Fifty of a possible 100 points were to be awarded for the quality of the program design. To 
obtain the full 50 points, applicants had to (a) link performance to changes in student 
achievement, (b) describe how the program develops principals (and teachers) while building 
capacity, (c) use valid and reliable measures, and (d) implement a fair and rigorous performance 
evaluation program. As outlined in more detail below, these selection criteria of the TIF are in 
keeping with the U.S. Department of Education’s educational goals as well as key components of 
a design framework for developing pay-for-performance systems. 

Incentive System Design 

The goal of this inquiry is to document the proposed practices and programmatic features 
used in leadership incentive systems across the nation. The design framework we construct here 
outlines the most important considerations that must be engaged when developing pay-for- 
performance systems. In some domains of our design framework, such as the reliability of 
measures, predefined benchmarks and standards exist. However for many other design elements, 
there is no singular correct feature, and throughout our framework, we emphasize the options, 
rather than the result. We use our design framework to present common options and the trade- 
offs that need to be considered when selecting an option within any design element. 

We view each of the design elements we present below to be essential to a well-defined 
pay-for-performance system. That is, neglecting any particular element is detrimental to the 
incentive program overall. This strategy allows our analysis to be descriptive — identifying the 
various choices each organization made in their pay-for-performance system — while also 


1 At the time of this analysis, proposals funded for Round 3 had not yet been released. 
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evaluative, as organizations that have not articulated their preferred approaches have 
underdeveloped designs. 

System Elements 

A growing body of evidence relates to the design, implementation, outcomes, and 
effectiveness of teacher incentive pay programs (Springer & Balch, 2009). In contrast, relatively 
little has been published about the design of incentive pay structures for school principals and 
other educational leaders. Nevertheless, as noted above, the TIF requires applicants to include 
incentive pay structures for both teachers and principals in their proposals. To better understand 
the unique position of principals the TIF grants cover, we refer to published literature on 
principal pay and evaluation as well as incentive and accountability design structures in 
education broadly. 

A review of compensation and evaluation literature reveals three broad categories for a 
design framework of incentive systems: measurement, rewards, and program structure (Odden & 
Kelley, 1997; Podgursky & Springer, 2007; Springer & Balch, 2009). Measurement refers to the 
domains of measurement, the quality of measures, the capacity of evaluators (those 
implementing the measures), and the frequency of measurement. Reward considerations include 
the reward type, size, and frequency. Finally, program structure includes competitive structure, 
benchmarks, and the linkage of performance to rewards. Each of these categories is discussed 
further below. 

Measurement. 

Domains of measurement. The multidimensional nature of school leadership requires an 
evaluation process that can measure the varied leadership dimensions effectively (Smither, 
London, & Reilly, 2005). At the same time, dimensions should be chosen carefully and used 
judiciously, as too many may lead to confusion in interpreting results or supporting payout 
decisions (Gerhart & Milkovich, 1992). The challenge lies in determining the optimal balance of 
leadership measures. Goldring et al. (2009) emphasize the complexity of the principal’s role and 
duties, and they suggest that evaluations measure four domains of principal effectiveness: 
responsibilities, knowledge, processes, and organizational outcomes. 

Quality of measures. Regardless of the domains, the measurement instruments used are 
expected to meet minimum standards of validity and reliability (Carmines & Zeller, 1979). 
Measurement validity is demonstrated by providing evidence that the instrument accurately 
measures the domain it purports to measure; reliability is demonstrated by providing evidence 
that the instrument functions with a small error variance, that is, the instrument is precise 
(Milgrom & Roberts, 1992). 

Evaluator capacity. Principal assessments often presume that evaluators have the 
expertise to perform sound leadership assessment based on their position as supervisors in the 
hierarchy. However, in practice, districts may not specifically identify who performs the 
evaluation or fail to provide adequate training or support to the evaluator (Goldring et al., 2009). 
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Such ambiguity can be problematic if district staff lack the appropriate skills to evaluate school 
leadership, leading to miscommunication in the expectations of the principal evaluation process. 
Kienapfel (1984) writes that ideally, principals should be assessed by a well-trained immediate 
supervisor to facilitate the communication of expectations in both directions throughout the year. 
Any evaluation process that relies on the professional judgment of an individual should clearly 
articulate who is conducting the evaluation and what training has been provided. 

It is now commonplace for measures of student growth to be associated with teachers and 
principals. The statistical modeling required to generate value-added estimates can be technically 
demanding: Just as supervisors may need training to accurately evaluate school leaders, data 
management personnel may need assistance constructing valid school or teacher value-added 
models, either through training or by contracting with educational consultants who specialize in 
quantitative modeling or observational evaluation techniques. 

Frequency of measurement. In almost every state, principals are evaluated at least once 
annually (Goldring et al., 2009), although assessments can occur more frequently. When 
measurements are evaluative in nature and are intended to serve as incentives through 
performance bonuses, principals’ effectiveness and leadership should be measured more than 
once per year (Murphy, 1999). These assessments can be either formative or summative, and 
may consider different types of data at different points throughout the year. 

Reward considerations. 

Type of reward. Individuals in the public sector may be less driven by financial rewards 
than those in the private sector, suggesting that alternative forms of compensation may be 
feasible (Borjas, 2002). Social psychological literature on public service motivation argues that 
these individuals may not respond to incentives related to performance or commitment, rather, 
they seek to contribute to the public good to satisfy personal needs (Courty, Heinrich, & 
Marschke, 2005; Perry & Porter, 1982; Perry, Hondeghem, & Wise, 2010; Rainey, 1982). The 
same logic likely holds in education, where teachers may be less responsive to financial 
incentives. As a result, many school systems reward teachers for exceptional performance by 
offering improved working conditions, paid leave, mentoring and induction programs, and job 
expansion (Springer & Balch, 2009). Because the overwhelming majority of principals are 
former teachers, it is likely that principals also view their profession as a form of stewardship, 
suggesting that non-monetary rewards could be used to motivate their performance as well. 

Reward amount. Although program designers may want to consider integrating some 
non-pecuniary incentives, in practice monetary rewards often dominate incentive plans. Despite 
the prominence of financial rewards, there is limited evidence regarding the optimal size of 
monetary bonuses for principals within incentive pay programs; however, several studies on the 
size of teacher incentive pay programs may provide useful insights into program design. 

Analysis of the Texas Educator Excellence Grant demonstrates that bonus awards range from 
0.4% to 365% of a teacher’s monthly salary (Springer & Balch, 2009). Translated into dollar 
amounts, these bonus awards ranged from $20 to $20,462, with the majority of teachers awarded 
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between $1,000 and $3,000, Springer et al. (2008) found. Relative to total salary, this represents 
approximately 2-6% for early career teachers and 1-3% for more experienced teachers. Several 
studies suggested that, in general, bonus awards for teachers were so small that the motivational 
value of most incentive systems had been compromised (Chamberlin, Wragg, Haynes, & Wragg, 
2002; Heinrich, 2007; Malen 1999; Taylor & Springer, 2009). Evidence from the Project on 
Incentives in Teaching study in Nashville, Tennessee, suggests that even large incentives for 
teachers (a possible $5, 000-$ 15, 000 awarded for performance between the 80th and 100th 
percentile in middle school math achievement) do not necessarily translate into student 
achievement gains. However, the study examined only the influence of an incentive pay program 
for student achievement and did not address other factors related to teaching and learning. In 
contrast, Fryer, Levitt, List, & Sadoff (2012) applied a loss-aversion framework, providing 
teachers with a bonus and requiring the bonus be paid back if certain levels of student growth 
were not reached. The authors found significant growth in student academic achievement when 
teachers’ maximum bonuses were $8,000 and expected bonuses were $4,000. Collectively, these 
findings suggest that reward amounts of $4,000 per year may be adequate to induce measurable 
change in performance, but the reward amount must be considered in conjunction with other 
elements of the incentive system. 

Frequency and timing of payment. Hollensbe and Guthrie (2000) found that most 
incentive-pay programs in the United States distribute awards only once a year, as they are often 
dependent on end-of-year assessments. Springer and Balch (2009) note the practicality of 
aligning performance awards with the end-of year assessments while advocating for reducing the 
time between assessments and award to create a more transparent link between action and 
reward. Fryer et al. (2012) provide an interesting counterpoint to the conventional order of 
perform, assess, reward. In a randomized experiment, they provided teachers with their bonuses 
first, with the prospect of losing the bonus in the future if targets were not met, building upon 
loss-aversion literature. This approach showed modest increases in student performance for 
teachers who received the bonuses. Pay-for-performance programs need to be clear about how 
often rewards are distributed, balancing the need to provide the reward close to the assessment 
with other logistical considerations. 

Program structure. 

Part of the logic underlying pay-for-performance systems is that they reward individuals 
for meeting predefined performance goals. How practitioners structure goals is a key 
consideration in the design of compensation systems. Here we consider the competitive structure 
(group or individual rewards), the importance of setting benchmarks (normative or criterion 
referenced), and definitions of the structure that links performance to rewards. 

Competitive structure. The compensation literature includes substantial debate on the 
trade-offs between group versus individual incentives. This topic is particularly important given 
evidence that educators may be somewhat less motivated by financial rewards and more 
motivated by collaboration. The logic of supporting a group approach among educators is that 
peer motivation, an enhanced feeling of teamwork, and the belief that the benefits of one’s 
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contribution extend beyond the individual each complement the motivation of financial 
incentives (Guthrie & Hollensbe, 2004). For educators who are motivated by an ethos of 
stewardship and have a professional drive to support their community, group rewards may have 
noticeable benefits. While individual incentives have been shown to be superior for 
accountability purposes (by mitigating the problems of free-ridership), they may stifle 
collaboration in ways that group incentives do not (Blinder, 1989; Brown & Heywood, 2002; 
Brown & Armstrong, 1999). 

The concern that competition may limit collaboration for school principals is mitigated 
by the organizational structure of schools, as principals can only be compared across (rather than 
within) schools. Nonetheless, a distributed leadership perspective (Spillane, Halverson, & 
Diamond, 2001) suggests that incentives designed for the leadership team rather than the 
principal alone may better promote within-school collaboration. An alternative strategy to 
introduce a group dynamic might be to make principal rewards a function of teacher rewards. 

Setting benchmarks. When considering the competitive structure, designers of pay-for- 
performance programs must choose between criterion-referenced or norm-referenced 
benchmarks (Lavy, 2002). A norm-referenced structure that rewards the top 20% of principals 
creates competition, which may enhance program efficacy. One drawback to the norm- 
referenced system is that all the principals may fall below an absolute measure of effectiveness 
and the top portion will still be rewarded. Similarly, all principals may achieve above 
expectations and still only the top portion receives bonuses. 

Compensation systems with criterion-referenced benchmarks can create budgeting 
challenges because all the participants could qualify for rewards, at significant cost to the district. 
However, this approach may better reflect the organizational culture of schools, which are 
heavily invested in student, teacher, and leadership standards. Before implementing a norm- 
referenced competitive structure, program administrators should have a reasonable understanding 
of how principals’ performance is distributed relative to a benchmark and how this distribution is 
likely to change with the introduction of performance incentives. Criterion-referenced standards 
also require districts to clearly define their performance expectations of school leaders. 

Linking performance to rewards. The complexities of mapping performance to rewards is 
exemplified by three scenarios using only one measure — student academic achievement: First, 
consider the scenario where a principal receives $5,000 if 75% of his or her students are 
proficient or above in reading on the state exam. In a second scenario, a principal receives $66 
for each percentage point of students at or above proficient, up to $5,000. In the third scenario, a 
principal receives $100 for each percentage point of their students at or above proficient, but this 
bonus does not start until 25% of students are at or above proficient and is capped at $5,000. In 
each scenario, principals receive $5,000 for having 75% of their students at or above proficient, 
yet the mechanisms linking performance to reward — and the implications for policy and 
practice — differ. In the first scenario, we can generalize to a binary, all-or-nothing linking 
mechanism that gives the complete reward for demonstrated performance over a certain level. As 
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illustrated by the second scenario, rewards could also be linked along a continuous scaling 
function (in this case, linear) where a unit increase on the assessment is proportional to the unit 
increase in reward. In the third scenario, we introduce the potential for variation, showing a 
linking mechanism that does not start generating rewards until a minimum threshold is reached. 
Other variations clearly exist: Programs can be designed to provide a lump-sum bonus for 
reaching a predefined level and then provide incremental rewards; piecewise functions are also 
an option, where principals may receive $50 per percentage point of their students above 
proficient, but $75 for students above the advanced level. 

When additional measures are introduced, we can think of additional variations on the 
above linking strategies. Pay-for-performance systems may develop “gateway” measures where 
a level of performance on one or more tasks must reach a minimum threshold before the 
participant is eligible for the pay-for-performance rewards. Such gateway measures may be 
attendance at professional development sessions or demonstrating progress on high-stakes 
exams. 


Springer and Balch (2009) suggest the above approaches are promising for defining the 
required level of performance. Each linking strategy has advantages and challenges. Binary, or 
all-or-nothing , linking systems, for example, may encourage a narrow focus on the minimum 
bar, neglecting the possibility to reward growth up to or below that bar. The benefits of all-or- 
nothing linking systems lie in their ability to create a clear and readily definable goal for 
principals to work toward. 

Table 1 summarizes the key criteria that should be considered when designing a 
performance pay system that focuses on measurement, rewards, and program structure. 

TIF Proposals 

This study examines all 34 funded TIF proposals from Rounds 1 and 2 of the program 
representing 1,315 high-need urban or rural schools as described in the appendix. The average 
number of schools covered by a proposal was 34, although the numbers ranged from one to 1 16 
schools. Three proposals represented charter schools (n=59); all other proposals were for 
traditional public schools. Additional information on the funded grants we examine is presented 
in the appendix. 


Methods 

This study used a directed content analysis approach (Hsieh & Shannon, 2005), in which 
we extracted a pay-for-performance framework from extant literature on employee compensation 
and incentive plans. Before preparing any description, two authors read all proposals in their 
entirety to acquaint themselves with their form and substance. After reading all the proposals, we 
revised our framework, as it was clear that the proposals did not contain the level of specificity 
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needed to adequately address all the points in the initial coding plan.” Two authors read and 
coded the proposals independently and discussed with the third author any discrepancies to 
arrive upon a mutually agreeable understanding of the material and to ensure consistency in 
coding. 


Using concepts in the literature, we coded our data using a three-stage framework. The 
first stage focused on elements of measurement, the second stage examined performance 
rewards, and the third stage attended to the structural elements of the plan. 


Table 1. Summary of key elements of pay-for-performance plans 



Questions and Considerations 

Measurement 

Domains of 

How many measures are used? What areas of leadership are being 

Measurement 

measured? 

Quality of Measures 

Have the measures been used in other settings? What evidence is 
available to support instrument validity? What evidence is available to 
support instrument reliability? 

Evaluator Capacity 

Who is conducting the evaluations? Who is constructing the growth 
models? What training or supports are needed to ensure a valid 
evaluation process? 

Measurement 

How frequently are measures collected? How soon can feedback be 

Frequency 

provided? 

Reward Considerations 

Reward Type 

Will all rewards be monetary? What other rewards might motivate 
participants? 

Reward Amount 

How large is the ideal bonus? Is the bonus sustainable? 

Reward Timing and 

How often are bonuses paid? Are bonuses paid by each measure or is the 

Frequency 

incentive one payment for a collective group of activities. 

Program Structure 

Competitive Structure 

Will participants respond best to group or individual incentives? 

Setting Benchmarks 

Will excellence be defined as performance relative to peers or will 
excellence be defined as performance relative to a set standard? 

Linking Performance 

How will demonstrable performance be linked to performance 

to Rewards 

incentives? 


2 

" Our initial coding scheme had, for example, several categories regarding key methodological and statistical 
considerations for measures of student academic growth. With few exceptions, proposals did not include such 
information upon which we could determine whether student growth measures were valid or reliable. Similar 
omissions and vagaries in other domains led us to develop the more general coding categories we have applied here. 




Designing Incentive Programs for School Leaders 


Results 

Measures 

Following the framework outlined above, this section details the measures included in the 
TIF proposals, addressing the number of measures used, the quality of the measures, evaluator 
capacity, and the frequency with which measures are collected. The plans for pay-for- 
performance systems for school principals considered many common measures, including 
improving raw student achievement, achieving various conceptions of student achievement 
growth, providing professional development, providing leadership coaching, facilitating the 
teacher incentive program, undertaking teacher performance reviews, and creating staffing 
incentives. To some extent, the variety in measures is an artifact of the TIF guidelines, which 
stipulate that all proposals must (a) give significant weight to student growth and (b) include 
observation-based assessments of teacher and principal performance at multiple points in the 
year. Table 2 shows the frequency of the various measures as they appeared across proposals and 
how often one measure was combined with another. 

Looking at the third row of the first column in Table 2, we see that 20 proposals intended 
to pair student achievement with a professional development component. Similarly, 23 proposals 
paired student achievement measures with some form of performance review. The least 
frequently employed measure was leadership coaching (six) while the most commonly employed 
measure that was not explicitly mandated by the TIF application requirements was professional 
development (20). The number of measures included in the average proposal ranged from two to 
five, with an average of three measures in any given proposal. We now turn to a discussion of the 
specific measures. 

Student achievement growth. The first row of Table 2 shows that 33 of the 34 proposals 
used some form of a student achievement measure (as required ). Our coding rubric, originally 
intended to document the ways proposals overcame issues of accurately measuring student 
achievement growth by including categories such as which value-added model was employed 
(e.g., gain model, persistence model, non-parametric approach, etc.), how the model was 
parameterized (e.g., the number of lagged values, selection of control variables, etc.), and how 
multiple tests were aggregated. We also hoped to learn how states determined which schools were 
included in the sample and how they selected their comparison group: Were the growth models 
applied to only the schools in the proposal or were all schools in the state included in the analysis? 

Despite the importance of these elements in modeling student growth (e.g., Ladd & 
Walsh, 2002), the vast majority of proposals were not specific enough to determine how 
applicants intended to convert student achievement into valid and reliable growth measures 
related to school leadership. In conflict with the explicitly stated student academic growth 
requirement of the grants, seven funded proposals failed to take any measure of student growth 
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The one grant recipient that did not have a student achievement component for principal incentives has an existing 
incentive program based on school value-added and is using the TIF grant to incentivize other aspects of leadership, 
namely leadership training. 
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and instead used an absolute measure of student achievement (typically measured as the 
percentage of students above a cut-score). 


Table 2. Counts of measures used in the 34 TIF proposals, as they appear in conjunction 
with other measures 



Student 

Performance 

Professional 


Program 



Achievement 

Review 

Development 

Staffing 

Facilitation 

Coaching 

Student 

Achievement 

33 






Performance 

Review 

23 

24 





Professional 

20 

18 

21 




Development 

Staffing 

12 

7 

6 

13 



Program 

Facilitation 

10 

7 

5 

3 

10 


Coaching 

6 

7 

6 

3 

1 

6 


All applicants proposed using their state assessments to determine student growth, and 
several planned to contract with outside assessors to test students in traditionally non-tested 
grades or to implement benchmark assessments to measure progress mid-year. Some proposals 
note the use of additional, teacher-developed assessments. Teacher-developed exams can be 
useful for formative processes, informing teaching and learning, but using these tests with 
unknown psychometric properties as a growth metric could be problematic. On the whole, 
proposals did not identify how their locally developed assessments are to be factored into their 
growth model. None of the proposals that planned to use teacher-developed assessments made 
note of scaling and validity concerns, suggesting that many applicants may be unaware of the 
basic measurement requirements of multi-assessment growth models. 

Performance review. Twenty-four applications proposed integrating some form of a 
leadership performance review into their principal incentive system. Performance reviews 
included such components as supervisor evaluations, walk-throughs, and multi-source (e.g., 
parent or teacher) feedback. None of the proposals supplied information about the reliability of 
validity of these measures. That said, the format of the performance review component was often 
the most developed category across all the proposals, generally including multiple time points 
and multiple measures. Of the 24 proposals including performance reviews, 15 would conduct 
multiple reviews during the school year. Most proposals sought to incorporate multiple sources 
into the review process, including teacher perspectives, parent perspectives, observational 
components, conferences, and artifact (e.g., portfolio) evaluations. 

However, nearly all proposals suffered from common failings in their performance 
review program. As noted, most proposals include multiple assessments, often from multiple 
stakeholders over the year; however, few stipulated how these assessments should be converted 
into a final evaluation. In Florida, Orange County’s Recognizing Excellence in Achievement and 
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Professionalism (REAP) program was an exception in stating that only the third (final) 
performance review would be considered in the incentive program. As with the student 
achievement component, we note that none of the proposals identified how the various 
performance review measures (e.g., teacher evaluations and observer evaluations) are weighted 
in the overall performance review. 

Professional development. Another measure included in many proposals was 
participation in leadership professional development. As with student achievement growth, we 
noted a lack of specificity in the proposals. Although professional development was a key 
component in 21 of the proposals, we found that applicants seldom explained how they would 
integrate professional development into the incentive system. That is, few proposals contained 
information explicating how professional development participation would be measured and 
rewarded, how frequently principals would be rewarded for their involvement in professional 
development, whether participants would be responsible for producing a deliverable product 
upon completion of the professional development program, or who was responsible for 
evaluating the principals’ participation. 

Several proposals presented professional development as a core component of the 
incentive system, selecting professional development components to complement the needs of 
the school. Such programs tended to focus on one key area of development, such as cultivating 
professional learning communities, and sustained this focus on professional development 
through multiple meetings across the school year. The proposals that designated professional 
development as their primary incentive mechanism were quite thorough in explaining the 
programmatic rationale and structure of the professional development, yet they fell short of 
articulating how they would integrate it into the incentive program and what the expectations 
were, beyond simply participating in the professional development. There was little to no 
articulation of who was tracking participation and what levels and types of participation would 
merit the proposed incentives. 

Further, the majority of the proposals with professional development components offered 
to compensate principals for attending professional development without specifying its aim or 
alignment to the school or district needs. Nearly all proposals failed to distinguish between 
participation in district-mandated professional development programs and additional professional 
development opportunities offered as a result of the incentive program. 

In addition, meaningful measurement of professional development activities was lacking. 
Attempts to define what principals would be required to do were often vague: “Principals from 
all four districts will participate in Professional Learning Communities at a minimum of twice a 
year,” a New Mexico proposal noted. These omissions raise questions, such as what constitutes 
participation — attending most sessions or active involvement in all sessions? And, is 
participation reported by the principal, the group supplying the professional development, or a 
district representative? Notably, four proposals specified measures of professional development 
beyond “attendance” or “participation.” Cumberland County, North Carolina, for example, 
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required principals to use professional development to create professional growth plans. 
Pittsburgh’s Principal Incentive Program included an assessment of applied professional 
development concepts, and Maryland’s FIRST program required principals to document how 
they implemented aspects of their professional development program in their school. 

Four of the 21 proposals using professional development as an incentive measure 
specified who would be measuring the professional development component; however, these 
plans were vague (e.g., district, steering committee, and supervisor) and provided no information 
as to training or preparation of the reviewers. This deficit was a common across measures. 

Staffing. Designed to attract individuals to lead challenging schools that have difficulty 
attracting and retaining effective leaders, staffing incentives were included in 13 of the 34 
proposals. The measurement of the incentive occurs when a principal accepts a new position in a 
school meeting predefined criteria. One challenge to using staffing incentives is that, if the 
incentive amount is too small, the applicant pool does not change and the only people receiving 
the staffing bonuses are the people who would have worked in that position without the 
incentive. None of the proposals presented a compelling strategy (such as demonstrating superior 
prior results) to mitigate this threat or to ensure that the principal taking the staffing incentive 
presented evidence of any prior leadership effectiveness. 

Program facilitation. Ten of the proposals rewarded principals for some form of program 
facilitation or implementation. These included principals conducting classroom observations and 
evaluations, scheduling or taking attendance during teacher professional development, or 
reviewing teacher work products (performance logs, portfolios, or self-improvement plans) to 
facilitate the teacher portion of the incentive plan. Measurement of these program facilitation 
components takes place, presumably, when the principal submits the materials, although the 
timing of this process is not clear. Although it would be unrealistic to expect evidence in favor of 
the validity and reliability of the program facilitation measures, we expected some 
documentation regarding how program facilitation would be measured and who would be 
collecting this information. This information was uniformly absent from the 10 applications that 
proposed using program facilitation in their pay-for-performance plan. In sum, the applications 
provided no guidance to determine whether a principal provided exemplary, acceptable, or 
inadequate program facilitation. We question whether these types of measures would serve as 
compelling incentives for a principal to change behavior, as in some instances these components 
are already part of a principal’s responsibilities. 

Coaching. Six proposals incorporated participation in leadership coaching (or mentoring) 
in their incentive program. Each one provided incentives to principals who participated in 
coaching relationships, with meeting frequency ranging from weekly to monthly; two proposals 
did not include any information on coaching frequency. Three proposals that explained how 
coaching would be evaluated using attendance or deliverables as part of the measure, yet they 
were unclear as to who would collect and evaluate these measures. 
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Overall, measures were incompletely defined and insufficiently developed to specify how 
they relate to measurement quality (validity and reliability), evaluator capacity, and measurement 
frequency. 

Rewarding Performance 

Performance rewards are a central component of any incentive plan, and we focused on 
three criteria of performance rewards: type, amount, and timing and frequency. We found no 
evidence that any applicants considered non-pecuniary rewards for school leaders. 

Participants must see the probability of receiving an award and the amount of the reward 
as sufficiently large to modify their behavior, but it should be noted that the probable reward 
differs in distinct ways from the possible reward. For example, a reward of $10,000 may be 
sufficient to modify behavior if the principal perceives the criteria to be reasonable and success 
to be probable. Alternatively, a reward of $80,000 or more may fail to motivate principals if the 
criteria are unattainable, such as an all-or-nothing goal to have 100% of students show 1.5 years’ 
growth in academic performance. For the purpose of this paper, we focus on the maximum 
possible award with the caveat that the actual probability and perceived probability of winning 
awards with similar maximum amounts will differ based on the program requirements. 
Accordingly, programs with similar maximum rewards will then differ in how well they motivate 
behavioral change. 


Table 3. Incentive rewards 



Average 

Maximum Bonus 

Average Percentage of the 
Maximum Possible Bonus 

Overall 

$11,800 


Achievement 

$6,450 

62% 

Professional Development 

$950 

18% 

Performance Review 

$1,000 

24% 

Staffing 

$7,700 

57% 

Coaching 

$750 

14% 

Program Facilitation 

$1,200 

15% 


Table 3 shows the average maximum possible incentive proposed overall and by 
measurement domain as well as how rewards from each measurement domain fit into the overall 
reward structure. The maximum possible bonus is determined by summing all the maximum 
rewards for each reward category (e.g., growth in student academic achievement, professional 
development) across the categories presented in each grant. Here we see that, of the proposals 
that included professional development, the average maximum possible reward was $950, 
representing 18% of total (average maximum possible) incentive rewards. From these data it is 
clear that incentives for student achievement and staffing (when staffing incentives are included) 
dominate the reward structure. Of the 28 proposals that reported a maximum possible incentive, 
five were greater than $15,000 and four were less than $5,000. Forty percent of public school 
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principals earn $80,000 or less per year (Goldring, Gray, & Bitterman, 2014); for these 
individuals, $1 1,800 represents 15% or more of their base income. The overall maximum median 
bonus of $10,000 was slightly above the mean, reflecting the modest skew from four proposals 
with maximum rewards in excess of $20,000. Assuming the benchmarks are perceived as 
attainable and the participants are well informed about the program, these incentives appear 
sufficiently large to induce behavioral change. 

The majority of programs failed to define payment frequency. Only one of the 21 programs 
that used professional development specified payment frequency (once per year). One other program 
used professional development as a gateway measure only, with no payment. One interesting 
example regarding the timing of rewards can be seen in a proposal that planned to give principals 
25% of their bonuses once their schools have been identified as high achieving or rapidly improving, 
with the remaining 75% of the bonus to be delivered after principals have worked with a 
development team to identify and share best practices they implemented in their schools. 

Program Structure 

Our framework defines competitive structure as being criterion referenced, asking 
participants to meet a predefined proficiency bar, or norm referenced, asking participants to be 
among the best of the competitors. Almost without exception, performance measures were 
criterion referenced, which can make budgeting challenging, yet fits well into the professional 
culture of the education sector, which is heavily invested in professional standards and 
collaborative in nature. 

Linking performance measures to rewards involves articulating how a score on a measure 
translates into a reward. One way to think about this linking process is to consider which 
principals are eligible to receive a reward and under what conditions. How the proposals 
described these considerations has implications for how participants respond to the program and 
how the proposals are funded. For instance, a key consideration is whether the awards will be 
distributed to a set proportion of principals (e.g., the top 10%) or whether all principals who meet 
or exceed set standards will be eligible for rewards. We note that all 34 proposals were structured 
in the latter form, where all principals could receive bonuses. While principals may be more 
likely to participate when everyone can conceivably win, financing for such plans requires 
accurate forecasting of the proportion of principals who are expected to exceed the benchmark. If 
the incentive plans call for measures currently used in the district, historical trends can be used to 
approximate reasonable growth as a result of the inclusion of incentives, but the proposals 
seldom included such calculations. 

Across the proposals, we saw evidence of three common strategies to link student 
achievement to principal rewards: gateway links, scaling, and all-or-nothing. As shown in Table 
4, six of the proposals did not provide a linking strategy for the student achievement portion of 
their performance plan. “Gateway” linkages allow principals to be eligible for other incentive 
rewards if they meet some predefined performance criteria in another domain. A gateway link 
could require that the school meet a minimum progress benchmark before the principal is eligible 
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for the professional development and student growth awards. As illustrated by this example, 
gateway measures in a given domain are not necessarily mutually exclusive with scaling or all- 
or-nothing rewards. Table 4 shows the various linking systems in the TIF proposals identified by 
measure. 


Table 4. Frequency of use for strategies linking measures to rewards 



Scaling 

All-or-Nothing 

Gateway 

Omitted 

Student Achievement Growth 

17 

8 

7 

6 

Professional Development 

2 

2 

4 

13 

Performance Review 

4 

4 

4 

12 

Staffing 

1 

6 

3 

1 

Coaching 

1 

1 

0 

4 

Program Facilitation 

3 

3 

3 

2 

Total 

28 

24 

21 

38 


The achievement rewards that used scaling all employed some sort of linear scale. This 
approach may be intuitive to understand, yet such connections do not accommodate situations 
where there are diminishing returns to higher performance, such as when a one-unit change is 
more difficult at one area of the measure (moving from 99% to 100% of students proficient, for 
example) as compared to another area (moving from 67% to 68% of students proficient). 

Strategies for linking professional development and performance reviews to rewards were 
often omitted from the proposals. When present, connections were often vague, providing, for 
example, a reward amount for attending the required professional development sessions, but not 
specifying whether districts gave rewards for attending more or fewer sessions. When they were 
scaled, professional development rewards increased with the number of sessions attended, rather 
than hinging on a deliverable. Performance reviews tended to be scaled based on the number of 
goals met or by fulfilling the criteria in various aspects of the review process. Expectedly, 
staffing links were often all-or-nothing, with principals receiving bonuses for working in selected 
schools for predefined periods of time. 

All applicants used an aggregation approach to link across measures within a given 
proposal (e.g., linking student academic growth to professional development and performance 
reviews). In this approach, applicants would determine the bonus for each measure and sum the 
total. For example, a principal might be deemed eligible for a bonus by receiving a mark of 
“superior” on his or her performance review (a gateway link); receive $200 for each percentage 
point of the 22% of students above proficient, earning $4,400 (a linear scaling link); and receive 
$2,000 for engaging in professional development (an all-or-nothing link). The overall 
aggregation of performance review, student achievement growth, and professional development 
measures would award this principal $6,400. 
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Discussion 

This review studied the ways practitioners have conceived of alternative compensation 
and incentive structures for school leaders as proposed for the TIF grant. Based on a synthesis of 
the theoretical and empirical research on incentive programs, our review consists of 
considerations regarding (a) measurement, (b) reward considerations, and (c) program structure. 
An application of the framework we develop here may help districts think more strategically 
about how to develop pay-for-performance plans or help them seek out collaborators in 
designing the incentive system. The remainder of this paper examines common alignment to and 
divergence from our incentive framework, suggesting solutions and implications. 

Most proposals tended to omit or erratically speak to the psychometric, statistical, and 
logistic design elements of the proposed incentive system. District personnel, who are largely 
responsible for the design of incentive systems, are likely to be well versed in aspects of school 
leadership, local context, and professional development, but they may not have the knowledge to 
devise an effective incentive plan. 

Leadership performance measures should be valid, reliable, linked directly to desired 
goals, and be based on more than one metric. All 34 proposals included multiple measures as 
part of their incentive plans. Nearly all the proposals failed to address the validity or reliability of 
these measures. With respect to student achievement growth, many of the proposals’ authors 
appeared unaware of the literature regarding the complexities and trade-offs involved when 
modeling the value teachers and schools add to student academic performance. In examining 
how selected measures factor into an incentive plan, we identified widespread failings of 
proposals to explain how multiple measures would be weighted relative to other measures and 
how they should be aggregated over time. These oversights exemplify the psychometric, 
statistical, and logistic hurdles that districts commonly encounter when selecting incentive 
measures. 

Districts that neglect instrument validity risk developing a biased incentive system that 
systematically favors some principals over others as well as unintentionally rewarding one 
behavior while intending to reward a different behavior. When measurement reliability is 
unknown, the measures may be unbiased, but measurement may be erratic, making decisions 
based on these measures challenging. If principals feel that measures are unfair or haphazard, 
they may reject the incentive system out of hand. Instituting a program with measures of 
unknown reliability and validity could also undermine trust between school and district 
leadership, creating organizational discord that is difficult to rectify. 

When we examined the domains that the proposals sought to measure (e.g., professional 
development, performance review) and how these domains were measured (e.g., attendance, 
observations), we found that performance goals set by the proposals were not substantially 
different from the expected status quo for principal performance. This finding raises the question, 
“To what extent did these proposals simply create additional rewards for what principals are 
already doing?” With the vast majority of principals reporting ratings at or above the satisfactory 
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level (Reeves, 2009), the majority of principals likely will receive incentive bonuses for the 
performance review portion, without any modification of their professional practice. 

In most districts, principals are expected to maintain and expand their leadership skills 
through continuing professional development. Identifying a promising professional development 
program that meets the district’s needs is insufficient rationale for instituting an incentive system 
around professional development. Performance incentives should be applied when (a) improved 
performance in a given area can have a positive impact on a district’s goals, (b) there is reason to 
believe that principals are not putting forth optimal effort, and (c) existing incentives are 
insufficient to motivate behavioral change. With the exception of three proposals that outlined 
detailed professional development programs, none of these proposals were able to identify why 
and how incentives for professional development — which was likely taking place without 
incentives — would address the goals of the district. 

Creating an incentive system for leadership coaching raises similar questions regarding 
the clarity of the integration between the incentive system and district goals. It is not readily 
evident that principals should be rewarded for both reaching a goal and for undergoing coaching 
to reach that goal. If principals require a financial incentive to meet with a coach, this suggests 
that the outcome incentives (e.g., performance reviews, student achievement) may be inadequate. 
An alternative interpretation is that these applicants may have constructed a dual incentive system 
in response to severe challenges they face regarding the motivation of their school leaders. 

When creating an incentive program, schools and districts would benefit from clearly 
articulating the minimum expectations and then structuring incentives to support leadership 
outcomes beyond this baseline level. Incentive programs should identify key factors along the 
path to desired outcomes where districts feel performance is lacking. Additionally, incentive 
supports should be an explicit part of the incentive system: Outcome incentives may help 
principals identify what needs to change (e.g., student achievement), but it is the incentive 
supports that can show principals how to change (e.g., improving classroom observations). 

Given that the TIF program based 20% of each proposal’s evaluation on its confirmation 
that the district had adequate resources to ensure payouts, it would appear difficult to construct 
realistic cost projections without devoting some thought to how evaluations translate to dollars 
and considering the relative probability of the various pay-for-performance outcomes. Arriving 
at accurate predictions of incentive payouts, especially with the criterion-referenced reward 
structures favored by TIF applicants, requires the application of statistical skill that may not be 
present among all districts. When districts fail to set these benchmarks at the optimal level they 
risk demoralizing leaders’ effort if too many or too few leaders qualify for awards. Failure to 
specify linkages also obscures the reward system and can create uncertainty regarding 
expectations. Explicitly stating the performance-to-reward linkages creates an incentive system 
that is fiscally sound as well as transparent to participants. 
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Although the majority of proposals lacked novel ideas or innovative approaches, 
applicants from South Carolina proposed a simple, yet compelling method to scale their bonuses: 
“South Carolina will differentiate the amounts allocated per principal and assistant principal 
relative to free or reduced-price lunch percentages in the specific school, so disadvantaged/poor 
children will have equal access to effective leadership at the principal level” (quoted from the 
South Carolina TIF proposal). This strategy has the benefits of being simple and facilitating 
transparency. South Carolina’s scaling plan also functions as a proxy for signing bonuses by 
attracting individuals to high-need schools, and thus is a parsimonious design feature. 

Implementing underdeveloped programs will undermine support for the intervention and 
degrade potential outcomes. In addition to soliciting support from experts outside the district 
(e.g., consulting, research collaborations), several other strategies may help districts create more 
robust and comprehensive pay-for-performance plans. In part, this paper serves such a support 
function by outlining the key components of performance pay systems, and by identifying the 
benefits and trade-offs for the most prevalent design elements. Pay-for-performance plans should 
include timelines that identify which measures will be collected, when feedback will be 
provided, and when payments will be distributed. In addition to facilitating planning and 
implementation, the distribution of such timelines to the school leadership teams can improve 
communication and transparency among stakeholders. Last, districts may want to use 
simulations to model potential outcomes and then work backward, engineering incentive 
components to match desired outcomes. 


Conclusion 

A systematic review of the 34 proposals approved through the TIF program reveals that 
almost all are substantially underdeveloped, demonstrating fundamental misunderstandings of 
the design of incentive programs that could bear fruit to improve school leadership. This finding 
suggests that developing a high-quality performance incentive system for school leaders is 
neither simple nor self-evident. Many districts, it appears, may not have the capacity to construct 
such systems or, if they have the capacity within the organization, have failed to access the 
expertise needed to develop their plans. Poorly formulated incentive programs will, at best, be 
inefficient; at worst these programs can motivate behaviors that are counterproductive to the 
desired outcomes. In addition, research on these types of plans is likely to find no effect on 
school leadership simply because the programs are insufficiently specified and underdeveloped. 
The framework set forth in this paper should give districts a starting point for collaboration and 
discussion when considering how to develop their own performance incentive systems. 
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Appendix 1. Features of Teacher Incentive Fund proposals 




#of 


Incentive 

Student 

Professional 

Receiving 

Program 

Performance 



Program 

State City (or District) 

Schools 

School Type 

Maximum 

Achievement 

Development 

Coaching 

Facilitation 

Review 

Staffing 

Other 

Alaska 

AK Three districts statewide 

27 

rural, high-need 

28% 

X 

X 


X 



X 

Chicago (REAL) 

IL Chicago Public Schools 

40 

urban 

$5,000 

X 



X 




Cumberland County 

NC Douglas Byrd district 

5 

urban 

$5,000 

X 

X 


X 

X 


X 

Dallas 

TX Dallas Independent School District 

220 

urban 

$10,000 

X 

X 



X 

X 


Denver 

CO Denver Public Schools 

150 

urban 

$43,250 

X 

X 



X 

X 

X 

Eagle County CO 

CO Eagle County Public Schools 

13 

rural, high-need 



X 



X 



Financial Incentive Rewards for Supervisors and Teachers 

Prince George's County Public 
” u Schools 

15 

urban 

$12,500 

X 

X 



X 



Houston: Project SMART 

TX Houston Independent School District 

109 

urban 

$3,000 

X 




X 



Leadership for Educators’ Advanced Performance (LEAP) 

NC Charlotte-Mecklenburg 

16 

urban, suburban 

$10,000 

X 

X 



X 

X 


Memphis Effective Practice Incentive Fund 

TN Memphis City Schools 

17 

urban 

$15,000 

X 





X 

X 

Mission Possible: Guilford County NC 

NC Guilford County Schools 

7 

high-need 

$15,000 

X 

X 




X 


MIT Academy 

_ . MIT Academy (Middle and High 
Charter Schools in Vallejo) 

2 

charter 

$37,050 

X 


X 


X 

X 


New Leaders New Schools National Charter Project 
Effective Practice Incentive Fund 

- nationwide 

47 

charter 

$20,000 

X 


X 



X 

X 

New Mexico 

Hjiy Espanola, Springer, Des Moines, 
Cimarron school districts 

omitted 

rural, high-need 


X 

X 



X 



Ohio Teacher Incentive Fund 

_ u Cincinnati, Cleveland, Columbus, 
0H Toledo 

omitted 

urban 

$2,000 

X 

X 



X 



Partnership for Innovation in Compensation for Charter 
Schools 

NY New York City 

10 

charter 

$8,000 

X 



X 




Performance Outcomes With Effective Rewards 

FL Hillsborough County 

116 

urban/ suburban 

5% 

X 




X 

X 


Philadelphia 

PA School District of Philadelphia 

20 

urban 


X 

X 



X 



Pittsburgh's Principal Incentive Program 

PA Pittsburgh 

65 

urban 

$12,000 

X 

X 



X 



Project Excel 

AZ Tucson (Amphitheater Unified) 

11 

nine high-need 
urban, two rural 

$10,000 

X 

X 

X 




X 

Quest for Success 

CA Lynwood County 

18 

urban 

$9,700 

X 

X 



X 



Recognizing Engagement in the Advancement of Learning 

CO Harrison School District 2 

25 

urban 

$2,000 

X 


X 





Recognizing Excellence in Achievement and 
Professionalism 

FL Orange County Public Schools 

10 

urban 

$5,000 

X 

X 



X 



Rewards and Incentives for School Educators 

FL Miami-Dade 

36 

urban 

$1,000 

X 

X 

X 





Schools Under Performance Pay Offer Remarkable 
Teaching 

FL Lake County 

10 

unknown 

$10,000 

X 

X 

X 


X 



System to Motivate and Reward Teachers 

OK Beggs 

3 

rural 

$11,000 

X 


X 

X 


X 


South Carolina 

SC Statewide (six districts) 

23 

rural, high-need 

$23,000 

X 



X 

X 



South Carolina Teacher Incentive Fund 

SC Florence & Laurens 

6 

rural 

$6,000 

X 




X 



South Dakota Incentive Fund 

SD Statewide (1 1 districts) 

30 

Title 1 

$6,000 

X 

X 



X 



Teacher and Principals Awarded for Student Achievement 

TX San Antonio 

6 

urban 


X 

X 


X 

X 



Teacher Excellence Incentive Project 

MA Boston 

1 

urban 

$10,000 

X 





X 


University of Texas System 

TX Statewide (seven districts) 

27 

disadvantaged 

$8,000 

X 





X 


Washington, D.C., Effective Practice Incentive Fund 

DC District of Columbia Public Schools 

25 

urban 

$22,250 

X 





X 

X 

Weld County 

rr) Weld County School District RE-8 
(Fort Lupton) 

4 

rural, high-need 

3% 

X 



X 

X 
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